74        Bioinformatics

The size of this SAM file is about 19G. We can convert it to the BAM format using Samtools

to save some storage space. First, you need to change to the “sam” directory and then run

the following:

samtools view \

-uS \

-o SRR769545_mem.bam \

SRR769545_mem.sam

The new BAM file is about 15G. We can then delete the SAM file as follows to save some

storage space:

rm SRR769545_mem.sam

BWA-MEM2 is an optimized BWA-MEM algorithm that has been recently released. This

new version produces alignments identical to BWA-MEM but it is faster and the indexing

occupies less storage space and memory [18]. You can install BWA-MEM2 separately by

following the instructions available at “https://github.com/bwa-mem2/bwa-mem2”.

2-BWA-SW

The BWA-SW [9] algorithm, like BWA-MEM, can also be used for the alignment of

single- and paired-end long reads generated by all platforms. It uses SW local alignment

approach to map reads to a reference genome. BWA-SW has a better sensitivity when

alignments have frequent gaps. However, this algorithm has been depreciated by his devel-

oper since BWA-MEM is restructured for better performance. The following “bwa bwasw”

­performs read alignment as above:

bwa bwasw \

-t 4 \

refgenome/GRCh38.p13_ref.fna \

data/SRR769545_1.fastq.gz \

data/SRR769545_2.fastq.gz \

> sam/SRR769545_bwasw.sam 2> sam/SRR769545_bwasw.log

You can convert this SAM file to BAM file as we did above or you can just delete it to save

some space.

3-BWA-backtrack

The BWA-backtrack algorithm is designed for aligning Illumina short reads of a length

up to 100 bp with sequencing error rates below 2%. It involves two steps: (i) using “bwa

aln” to find the coordinates of the positions, where the short reads align, on the refer-

ence genome, and then (ii) generating alignments with “bwa samse” for single-end reads

or “bwa sampe” for paired-end reads. The base call quality usually deteriorates toward

the end of reads generated by Illumina instruments. This algorithm optionally trims low-

quality bases from the 3-end of the short reads before alignment. Therefore, it is able to

align more reads with high error rate toward the 3-ends of the reads.